Cream of the Crop 22

home *** CD-ROM | disk | FTP | other *** search

/ Cream of the Crop 22 / Cream of the Crop 22.iso / os2 / tton1770.zip / templeton.cfg < prev next >

Wrap

Text File | 1996-10-20 | 14KB | 316 lines

# ************************************************************* # Templeton, copyright 1995, 1996 N.A. Krawetz # All rights reserved. # ************************************************************* # configuration for Templeton # # Lines beginning with a '#' are comments and are ignored. # Lines should not be more than 80 characters. # Operands in this file are in the form: # parameter value # The parameter is case insensitive, except where a text string or URL # is required. # Boolean values ("true" or "false") are case insensitive. # Numeric values should be numbers -- non-numbers are regarded as 0. # All other types of values ARE case sensitive. # ******************** Registration **************************** # Register: registration code # Software that is registered contains a unique registration # code. This code should be entered exactly as it is provided. # If your site contains multiple registrations, you may list # each registration code on a line starting with the # key word "Register". # Please read the licensing agreement for registration # information. # Register 12-34567-891011 # ******************* File System ***************************** # LocalPath: absolute path # LocalPath informs the program where to store the downloaded files. # IF this path is: # LocalPath none # THEN no files are generated. Only a log file containing the remote # servers WWW map is created in the current directory. # # Currently, files should be stored in the root directory of the file system. # For WWW servers, this is the server's root directory. # (This limitation will be removed in future releases.) # For DOS based machines, this path may include a drive letter: # LocalPath e:\server.www\ # # Either slash "/" or backslash "\" are valid for specifying a directory. # The trailing slash or backslash is optional. # # This option is only used when the "Interactive" option is FALSE. LocalPath / # User: e-mail address # In case of emergency, this is the person who is running the program # and who should be contacted to stop the program from running. # This MUST be a valid e-mail address, and SHOULD also be available with # a "talk" command. # As a side note, it is never a good idea to let automatic software run # unsupervised (especially this type of software). The "User" should be # available to read their e-mail at all times during the execution of this # program. # The default is the account running the program on the current machine. # User webmaster@host.machine.org # ********************* Restrictions ***************************** # RestrictHost: boolean # This parameter informs the program not to leave the designated host. Links # to machines not on the current host are not traversed. RestrictHost TRUE # RestrictPath: absolute path # This parameter is only used when a host is restricted. # When a host is restricted, a subpath on that host may also be restricted. # Hypertext references to documents outside this subtree are not traversed. # Either slash "/" or backslash "\" are valid for specifying a directory. # The trailing slash or backslash is optional. RestrictPath / # RestrictDepth: numeric value # Hyperlinks are travered in a breadth-first search. An unrestricted search # may download an entire WWW server's data. By restricting the depth, # only immediate portions of the server will be received. # Images and non-href links are considered to be at the same depth as the # document. # A restricted depth of 0 means no restriction. # The default is 1 RestrictDepth 1 # RemoveRestricted: boolean # This parameter informs the program to remove untraversed links. Links to # restricted machines or restricted depths are removed from the HTML file, # but the visible test is still available (just not a hyperlink). # The default value is FALSE. RemoveRestricted FALSE # Add: URL # Place a specific URL on the list of URLs to process. # Be aware that restrictions apply. # Exclusion: boolean # This parameter determines whether Templeton will support server provided # robot exclusion files (robots.txt). Many servers maintain exclusion files # to prevent robots from wandering around virtual directory trees, from # retrieving very temporary or uncomplete files, or copyright materials. It # is considered "polite" for web agents to obey the exclusion files when they # exist. The default value, TRUE, means that robot exclusion files are obeyed. # Setting Exclusion to FALSE will ignore robot exclusion files. Exclusion TRUE # Deny: URL # The URL provided, as well as all subtrees or the URL, are not processed. # Many times specific directory subtrees are not desirable. You can deny # retrieval of these URL's using this setting. # For example, to NOT retrieve the "archive" subtree of the host loco.com, # you would specify: # Deny http://loco.com/archive/ # If you do not include the trailing slash (http://loco.com/archive) then # all subdirectories beginning with "archive" are not processed. This # includes "archive.1", "archive.old", "archive_from_1994", etc. # Multiple Deny statements may be specified. # Allow: URL # Similar to "Deny", "Allow" explicitly specifies that a subtree is # retrievable. When used in conjunction with Deny URL, branches of a # subtree may be specified for access, while other subtrees are ignored. # Multiple Allow statements may be specified. # Sleep: numeric # Sleep determines the number of seconds to pause before sending a request to # a WWW server. SLEEP IS IMPORTANT. # Warning: Templeton can generate thousands of requests per minute. Many # WWW servers cannot handle a sudden onslaught of requests. Setting the # Sleep parameter to 0 (zero) may generate too many requests for the server # and kill the server. This is bad. # A sleep setting of 0 (zero) is known to kill the following types of servers: # All WWW servers that run under Microsoft Windows (TM) # Old generation (HTML/1.0) CERN servers on all platforms # Low sleep values may also generate large amounts of network traffic and # hog network resources. # For safety, you should set the sleep interval to at least 5 seconds. # The longer, the better. Remember, this program is automated and can # easily run for hours. What's the rush? Sleep 10 # ********************* Network ***************************** # ProxyHost: hostname or IP address # Proxy agents are machines that act as a gateway through a firewall. # If your local network uses a proxy agent, specify the name of # the proxy agent here. If you are uncertain about your network, consult your # network manager or provider. # A proxy server is only used when a server is specified. # ProxyHost proxyhost.network.net # ProxyPort: integer # When using a proxy server (see ProxyHost), the port on the proxy server # should be specified. The default port is 80. This values is not # used if no proxy host is specified with ProxyHost. ProxyPort 80 # Spoof: text-string # Some WWW servers make incorrect assumptions about the browser/robots. (Most # of these are the Netscape servers.) These servers assume that, since the # browser is not "Netscape" the browser cannot handle the HTML documents and # therefore, the document is not transfered. By "spoofing" a different name, # the WWW robot can use a qualified browser name to retrieve the HTML # document. # NOTE: The first word of the spoof-name is used for restrictions when # robot exclusion is honored (see Exclusion). This means, if Templeton tells # the WWW server that it is "Netscape" and the server does not permit # Netscape browsers, then the server will also not permit Templeton. # Common spoof names (and browsers) are: # Mozilla Netscape Browser # WebCrawler WebCrawler robot # InfoSeek InfoSeek robot # WebExplorer IBM WebExplorer for OS/2 # Harvest a web robot # Mosaic NCSA Mosaic # Lynx Lynx, text browser # Microsoft Internet Explorer # PRODIGY-WB Prodigy browser # Spoof Mozilla (Templeton) # ********************* Preferences ***************************** # FATFormat: boolean # Determines the file name format for the current operating system. # DOS based machines using drives formatted with a File Allocation Table (FAT) # can only handle file names containing 8 characters and a 3 character # extension. Setting this option to TRUE will generate 8.3 character file # names. The default is FALSE, and will generate unlimited length file names. # NOTE: Under DOS, this option is always TRUE (DOS only supports FAT file # names). Under OS/2, this value becomes TRUE automatically if the destination # path (LocalPath) is located on a FAT partition. FATFormat FALSE # FileOverwrite: boolean # Files that already exist on the local system are not normally downloaded. # Setting the FileOverwrite option to TRUE will overwrite files on the # local file system. Default value is FALSE. FileOverwrite TRUE # Index: file name # For hypertext references that only specify a directory, this is the # default html file in the directory. # NOTE: if FATFormat is TRUE, the 8.3 name translation will be applied to # this file name. # The default name is "index.html" Index index.html # ISMAP: absolute path to executable # For WWW servers, many imagemaps use a program that takes coordinates from # a selected image <IMG SRC=... ISMAP> and return a new URL. Some of the # more common methods use a data file containing known coordinates and a # program to identify which URL is activated. Commonly, this program is # called "imagemap" or "imagemap.exe". # The ISMAP parameter specifies the WWW server's path to the imagemap program. ISMAP /cgi-bin/imagemap # MapType: NCSA or CERN # For the executable specified in the ISMAP parameter (see above), this # option determines the format of the file. If the image map file can be # retrieved, then it is converted into this specified format. # Valid options are either "CERN" or "NCSA". The default is NCSA. MapType NCSA # ********************* Logging ***************************** # Mailto-File: file name # Similar to "Server-File" logging, the file name listed on the "Mailto-File" # line contains a list of e-mail addresses found in the HTML documents. Only # e-mail addresses that are active (hyperlinks) are used. E-mail addresses # displayed as plain text in the document or contained in CGI scripts are not # listed in the mailto logfile. # NOTE: This list MAY contain duplicate entries. Duplication removal may be # added in later versions. # (Some people have found this to be a very useful feature for generating # mailing lists.) # The default is no mailto logging. # Mailto-File mailtolist # RemoteMapping: boolean # Determines whether remote mapping will be done. The default is TRUE # while does perform mapping. The map file name is mapindex.html and is # either located at the root of the LocalPath or in the current directory # if the system is not mirroring files. # Note: if you change the default index name, for example, to "welcome.html" # then the default map file will be "mapwelcome.html". RemoteMapping TRUE # Server-File: file name # A data file is generated containing the host name, IP address, and # WWW server type for each server visited. For servers listed as IP # address only, the host name is also the IP address. # The default is no server logging. # Server-File serverlist # ********************* Advanced ***************************** # The advanced configuration commands should be used with caution. # These commands allow other applications to perform tasks on the # retrieved documents. Applications that are spawned (operate # concurrently) with Templeton may overwhelm the user or operating system. # Spawned applicatons include those begun with "start" under OS/2, # or followed by "&" under Unix. # NOTE: Templeton has the capability to spawn thousands of applications # in a few seconds. # On Unix-type systems, Templeton introduces security risks when executed # as root. # For applications that are not spawned, Templeton will pause until # the application has ended. This allows for a guarenteed order of processing # for the called applications. # Command_html: string # Execute a system command on each HTML document stored on the file system. # This may be useful for counting documents, storing statistics, printing, # converting, etc. # The string should contain the executable to run and a %s for the file name. # The string "none" turns off this command. This is the default. # For example: to convert all HTML documents to text using the program # html2txt (not provided with the Templeton distribution), you would use: # Command_html html2txt %s # Command_image: string # Execute a system command on each image-file stored on the file system. # Similar to Command_html, Command_image is executed on all image files. # This may be useful for counting documents, storing statistics, printing, # converting, etc. NOTE: no distinction is made between different image # formats. # The string should contain the executable to run and a %s for the file name. # The string "none" turns off this command. This is the default. # Command_map: string # Execute a system command on each image-map stored on the file system. # Similar to Command_html, Command_map is executed on all image-map file. # This may be useful for counting documents, storing statistics, or converting. # The string should contain the executable to run and a %s for the file name. # The string "none" turns off this command. This is the default. # Command_default: string # Execute a system command on each file stored on the file system. # Similar to Command_html, Command_default is executed on all files that have # no other executable specified. This may be useful for counting documents, # storing statistics, printing, converting, etc. # The string should contain the executable to run and a %s for the file name. # The string "none" turns off this command. This is the default. # Interactive: boolean # Determines whether the user should be prompted for # configuration information or if Templeton should # start running automatically. # The default setting is TRUE.